NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ThinkGuard: Deliberative Slow Thinking Leads to Cautious Guardrails

https://doi.org/10.18653/v1/2025.findings-acl.704

Wen, Xiaofei; Zhou, Wenxuan; Mo, Wenjie Jacky; Chen, Muhao (January 2025, Association for Computational Linguistics)

Full Text Available
Test-time Backdoor Mitigation for Black-Box Large Language Models with Defensive Demonstrations

https://doi.org/10.18653/v1/2025.findings-naacl.119

Mo, Wenjie Jacky; Xu, Jiashu; Liu, Qin; Wang, Jiongxiao; Yan, Jun; Askari, Hadi; Xiao, Chaowei; Chen, Muhao (January 2025, Association for Computational Linguistics)

Full Text Available
Learning from Active Human Involvement through Proxy Value Propagation

Peng, Zhenghao; Mo, Wenjie; Duan, Chenda; Li, Quanyi; Zhou, Bolei. (December 2023, Advances in neural information processing systems)

Learning from active human involvement enables the human subject to actively intervene and demonstrate to the AI agent during training. The interaction and corrective feedback from human brings safety and AI alignment to the learning process. In this work, we propose a new reward-free active human involvement method called Proxy Value Propagation for policy optimization. Our key insight is that a proxy value function can be designed to express human intents, wherein state- action pairs in the human demonstration are labeled with high values, while those agents’ actions that are intervened receive low values. Through the TD-learning framework, labeled values of demonstrated state-action pairs are further propagated to other unlabeled data generated from agents’ exploration. The proxy value function thus induces a policy that faithfully emulates human behaviors. Human- in-the-loop experiments show the generality and efficiency of our method. With minimal modification to existing reinforcement learning algorithms, our method can learn to solve continuous and discrete control tasks with various human control devices, including the challenging task of driving in Grand Theft Auto V. Demo video and code are available at: https://metadriverse.github.io/pvp.
more » « less
Full Text Available
ScenarioNet: Open-Source Platform for Large-Scale Traffic Scenario Simulation and Modeling

Li, Quanyi; Peng, Zhenghao; Feng, Lan; Liu, Zhizheng; Duan, Chenda; Mo, Wenjie; Zhou, Bolei. (December 2023, Advances in neural information processing systems)

Large-scale driving datasets such as Waymo Open Dataset and nuScenes substantially accelerate autonomous driving research, especially for perception tasks such as 3D detection and trajectory forecasting. Since the driving logs in these datasets contain HD maps and detailed object annotations that accurately reflect the real- world complexity of traffic behaviors, we can harvest a massive number of complex traffic scenarios and recreate their digital twins in simulation. Compared to the hand- crafted scenarios often used in existing simulators, data-driven scenarios collected from the real world can facilitate many research opportunities in machine learning and autonomous driving. In this work, we present ScenarioNet, an open-source platform for large-scale traffic scenario modeling and simulation. ScenarioNet defines a unified scenario description format and collects a large-scale repository of real-world traffic scenarios from the heterogeneous data in various driving datasets including Waymo, nuScenes, Lyft L5, Argoverse, and nuPlan datasets. These scenarios can be further replayed and interacted with in multiple views from Bird- Eye-View layout to realistic 3D rendering in MetaDrive simulator. This provides a benchmark for evaluating the safety of autonomous driving stacks in simulation before their real-world deployment. We further demonstrate the strengths of ScenarioNet on large-scale scenario generation, imitation learning, and reinforcement learning in both single-agent and multi-agent settings. Code, demo videos, and website are available at https://metadriverse.github.io/scenarionet.
more » « less
Full Text Available
A Causal View of Entity Bias in (Large) Language Models

https://doi.org/10.18653/v1/2023.findings-emnlp.1013

Wang, Fei; Mo, Wenjie; Wang, Yiwei; Zhou, Wenxuan; Chen, Muhao (January 2023, A Causal View of Entity Bias in (Large) Language Models)

Entity bias widely affects pretrained (large) language models, causing them to rely on (biased) parametric knowledge to make unfaithful predictions. Although causality-inspired methods have shown great potential to mitigate entity bias, it is hard to precisely estimate the parameters of underlying causal models in practice. The rise of black-box LLMs also makes the situation even worse, because of their inaccessible parameters and uncalibrated logits. To address these problems, we propose a specific structured causal model (SCM) whose parameters are comparatively easier to estimate. Building upon this SCM, we propose causal intervention techniques to mitigate entity bias for both white-box and black-box settings. The proposed causal intervention perturbs the original entity with neighboring entities. This intervention reduces specific biasing information pertaining to the original entity while still preserving sufficient semantic information from similar entities. Under the white-box setting, our training-time intervention improves OOD performance of PLMs on relation extraction (RE) and machine reading comprehension (MRC) by 5.7 points and by 9.1 points, respectively. Under the black-box setting, our in-context intervention effectively reduces the entity-based knowledge conflicts of GPT-3.5, achieving up to 20.5 points of improvement of exact match accuracy on MRC and up to 17.6 points of reduction in memorization ratio on RE.
more » « less

Search for: All records